-
Notifications
You must be signed in to change notification settings - Fork 908
mpi4py: enable spawn tests workflow by default #12591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i vote for merging tests into one github action
c854585
to
442fc6f
Compare
@jsquyres @hppritcha Merged workflows as requested. |
442fc6f
to
ad42bed
Compare
@hppritcha I merged #12526 and rebased this PR. |
@jsquyres Do you have more comments? Merge is blocked until you approve the change. |
If we're not splitting up into multiple runs on the same build any more, is there any reason to split into 2 jobs ( |
ad42bed
to
dc3f425
Compare
@jsquyres Good point. I merged both |
Much better, thanks. Do we really need to test all of these cases? I ask because the whole thing takes nearly 30 minutes. Is there value in all of these, or should we cut some of them / make the overall run shorter? E.g., the final np=5 run with the relocated OMPI -- does that need to be with np=5? Or do we even need that at all? I.e., if singleton works with relocated, do we test anything new with np=5? |
Honestly I don't think it's that bad - jenkins is currently the long pole at 50mins, so running mpi4py tests for 30mins sounds benign to me(especially considering bugs that it has revealed in the past). I'm not sure if np=5 is too big or small... if I were to write the test I might test the edge cases, e.g. singleton, np=3, np=total slots. @dalcinl What do you think? |
have you watched to see if the np=5 is the slow stage? |
@hppritcha The timing is in https://github.com/open-mpi/ompi/actions/runs/9322808384/job/25664637467?pr=12591 np=5 takes > 4mins. |
i thought the vm where these github actions were running has only 4 "cpus". Maybe adding |
The goal with np=5 might be to intentionally stress the overscribed scenario -- and that's probably a good thing. But do we need to do that with the relocated OMPI? I don't think so. At least for the relocated OMPI, it'll either work or it won't -- I think testing with either singleton or np=1 will tell us if it works, and that's good enough. Right? For the other runs -- I think running singleton, np=5, and np=something_else would probably be good. Do we need all of np=1, np=2, np=3, np=4? I.e., does running all of the np values tell us something that not running all of them wouldn't tell us? |
I don't think we need np 5 for relocated binary. I agree with @jsquyres aobut that one. |
The previous inter-communicator race condition has been fixed. Enable the workflow by default to catch new regressions. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
dc3f425
to
f5d769b
Compare
Ack. Removed np=5 relocate test. |
@wenduwan I still have to do changes on mpi4py's side for spawn tests to not be skipped under ompi@main. I'm doing a final testing here, and if that goes well, I'll merge mpi4py/mpi4py@54c0cf3. Please keep me posted once all these spawn fixes land in v5.0.x, we will have to update things again both in mpi4py and ompi. |
The previous inter-communicator race condition has been fixed. Enable the workflow by default to catch new regressions.